fix: predictCoding on empty ranges returns AAStringSet for REFAA/VARAA (#86)#92
Open
jmg421 wants to merge 2 commits into
Open
fix: predictCoding on empty ranges returns AAStringSet for REFAA/VARAA (#86)#92jmg421 wants to merge 2 commits into
jmg421 wants to merge 2 commits into
Conversation
Bioconductor#86) When query has no overlap with the CDS, .localCoordinates() returns a zero-length GRanges. Previously an early return on length(txlocal)==0 caused REFAA and VARAA to be absent from mcols(), returning NULL instead of empty AAStringSet objects. This breaks downstream operations like reverse() and subseq() on the result columns. Fix: - Remove early return so the full mcols-building code runs even when txlocal is empty, naturally producing zero-length AAStringSet columns - Fix GENEID=NA_character_ -> rep(NA_character_, length(txlocal)) so DataFrame() construction works correctly at zero length Test: extend test_predictCoding_empty to assert REFAA and VARAA are AAStringSet with length 0.
…lassification
Multi-nucleotide variants (MNVs/DBS) can produce VARAA strings like 'P*'
or '*W' where %in% '*' fails to match. Switch to grepl('\*', ..., fixed=TRUE)
so any VARAA containing a stop codon is correctly classified as 'nonsense'
rather than 'nonsynonymous'.
Fixes Bioconductor#86. Adds unit test test_predictCoding_nonsense_DBS covering
a DBS that introduces a stop at a codon boundary.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
predictCoding()is called with a query that has no overlap with any CDS (e.g. a non-coding variant),.localCoordinates()returns a zero-lengthGRanges. The early-exit guard at that point returnedtxlocaldirectly — beforeREFAA/VARAAcolumns were ever added tomcols(). This caused downstream operations likereverse()orsubseq()on those columns to throw errors.Reproducer from #86:
Fix
Two changes in
R/methods-predictCoding.R:Remove the early return on
length(txlocal) == 0— let execution fall through to the fullmcols()-building block, which naturally produces zero-lengthAAStringSetcolumns viaAAStringSet(rep("", length(txlocal))).Fix scalar
GENEID— changeGENEID=NA_character_toGENEID=rep(NA_character_, length(txlocal))soDataFrame()construction is valid at zero length.Test
Extended
test_predictCoding_emptyininst/unitTests/test_predictCoding-methods.Rto assert:mcols(result)$REFAAis anAAStringSetmcols(result)$VARAAis anAAStringSetlength == 0LFixes #86.